# load necessary packages
library(tidyverse)
library(mosaic)
library(DataComputing)
library(ggplot2)
What factors are common between all hall of fame MLB baseball players?
This document is required to indicate where various requirements can be found within your Final Project Report Rmd. You must indicate line numbers as they appear in your final Rmd document accompanying each of the following required tasks. Points will be deducted if line numbers are missing or differ signficantly from the submitted Final Rmd document.
Description: (1) Analysis includes at least two different data sources. (2) Primary data source may NOT be loaded from an R package–though supporting data may. (3) Access to all data sources is contained within the analysis. (4) Imported data is inspected at beginning of analysis using one or more R functions: e.g., str, glimpse, head, tail, names, nrow, etc
HallOfFame <- read_csv("core/HallOfFame.csv")
Parsed with column specification:
cols(
playerID = [31mcol_character()[39m,
yearID = [32mcol_double()[39m,
votedBy = [31mcol_character()[39m,
ballots = [32mcol_double()[39m,
needed = [32mcol_double()[39m,
votes = [32mcol_double()[39m,
inducted = [31mcol_character()[39m,
category = [31mcol_character()[39m,
needed_note = [31mcol_character()[39m
)
AllstarFull <- read_csv("core/AllstarFull.csv")
Parsed with column specification:
cols(
playerID = [31mcol_character()[39m,
yearID = [32mcol_double()[39m,
gameNum = [32mcol_double()[39m,
gameID = [31mcol_character()[39m,
teamID = [31mcol_character()[39m,
lgID = [31mcol_character()[39m,
GP = [32mcol_double()[39m,
startingPos = [32mcol_double()[39m
)
Salaries <- read_csv("core/Salaries.csv")
Parsed with column specification:
cols(
yearID = [32mcol_double()[39m,
teamID = [31mcol_character()[39m,
lgID = [31mcol_character()[39m,
playerID = [31mcol_character()[39m,
salary = [32mcol_double()[39m
)
Batting <- read_csv("core/Batting.csv")
Parsed with column specification:
cols(
.default = col_double(),
playerID = [31mcol_character()[39m,
teamID = [31mcol_character()[39m,
lgID = [31mcol_character()[39m,
IBB = [33mcol_logical()[39m,
HBP = [33mcol_logical()[39m,
SH = [33mcol_logical()[39m,
SF = [33mcol_logical()[39m
)
See spec(...) for full column specifications.
87292 parsing failures.
row col expected actual file
1999 HBP 1/0/T/F/TRUE/FALSE 2 'core/Batting.csv'
2001 HBP 1/0/T/F/TRUE/FALSE 2 'core/Batting.csv'
2020 HBP 1/0/T/F/TRUE/FALSE 2 'core/Batting.csv'
2022 HBP 1/0/T/F/TRUE/FALSE 2 'core/Batting.csv'
2027 HBP 1/0/T/F/TRUE/FALSE 5 'core/Batting.csv'
.... ... .................. ...... ..................
See problems(...) for more details.
head(HallOfFame)
glimpse(HallOfFame)
head(Salaries)
glimpse(Salaries)
Description: Students need not use every function and method introduced in STAT 184, but clear demonstration of proficiency should include proper use of 5 out of the following 8 topics from class: (+) various data verbs for general data wrangling like filter, mutate, summarise, arrange, group_by, etc. (+) joins for multiple data tables. (+) spread & gather to stack/unstack variables (+) regular expressions (+) reduction and/or transformation functions like mean, sum, max, min, n(), rank, pmin, etc. (+) user-defined functions (+) loops and control flow (+) machine learning
InductedP<-
HallOfFame%>%
filter(inducted == "Y")%>%
select(playerID, yearID)
InductedP
Money<-
Salaries%>%
select(teamID, playerID, salary)
Money
HallMoney<-
InductedP%>%
inner_join(Money, by = c("playerID" = "playerID"))
HallMoney
AvgHS<-
HallMoney%>%
group_by(playerID)%>%
summarise(Salary = mean(salary))
AvgHS
#Players in the whole league
WholeLeague <-
Batting %>%
filter(G > 20)%>%
select(playerID, yearID)
WholeLeague